
3.3.2 Hypothesis test
The hypothesis test for residual includes tests for heteroscedasticity, normality, and independence to
ensure the validity of the regression model’s assumptions.
Heteroscedasticity Test: Conducted to check if the variance of residuals is constant across different
levels of the predicted values.
Normality Tests: Shapiro-Wilk Test Statistic: , Kolmogorov-Smirnov Test Statistic:
0.0361. Given the sample size is 11,259 (which is greater than 5,000), the Kolmogorov-Smirnov test
is more suitable for assessing the normality of the residuals here. Both tests suggest that the residuals
approximately follow a normal distribution, although the distribution tails are thicker than those of
a normal distribution.
Independence Test: Durbin-Watson Statistic: 1.9641. This statistic is approximately equal to 2, indi-
cating that the residuals are essentially independent of each other.
Overall, the residual analysis supports the assumptions of the regression model, though there is a
slight deviation in the normality of the residuals as indicated by the thicker tails in the distribution.
3.3.3 Multicollinearity
To address multicollinearity, we used the Variance Inflation Factor (VIF) to detect and iteratively
remove variables with the highest VIF until all remaining variables, except the constant term, had a
VIF less than 10.
• Before Removal: R-squared: 0.721 Adjusted R-squared: 0.719 Test R-squared: 0.7207
• After Removal: R-squared: 0.718 Adjusted R-squared: 0.716 Test R-squared: 0.7192
The removed variables include:
• property_type_Shared room in home,
• property_type_Entire rental unit,
• room_type_Private room,
•
neighbourhood_cleansed_Sydney
It was observed that these variables were highly correlated with the retained variables.
For instance: property_type_Shared room in home was correlated with room_type_shared_room.
property_type_Entire rental unit was correlated with room_type_Entire_home/apt.
room_type_Private room had strong correlations with various subcategories of private rooms under
property_type. By removing these variables, we effectively reduced multicollinearity without sig-
nificantly impacting the model’s explanatory power, as indicated by the slight changes in R-squared
values.
4 Conclusion
Our analysis of Airbnb listings in Sydney offers several key insights:
1.
Neighborhood Analysis: Manly has the highest number of listings, and most listings are concen-
trated in the bay area. The priciest accommodations are also predominantly located in this region.
Safety varies by neighborhood, with theft being the most common crime, and safer neighborhoods
generally have higher review ratings.
2.
Accommodation Types: Entire homes/apartments dominate the market, while hotels are the least
economical choice on Airbnb. Property type analysis reveals a mix of accommodations, with
some misclassification observed.
3.
Pricing Patterns: Average prices fluctuate throughout the year, with noticeable declines during
certain periods, likely due to unupdated future prices.
13